Rapid hyperspectral photothermal mid-infrared spectroscopic imaging from sparse data for gynecologic cancer tissue subtyping

Ovarian cancer detection has traditionally relied on a multi-step process that includes biopsy, tissue staining, and morphological analysis by experienced pathologists. While widely practiced, this conventional approach suffers from several drawbacks: it is qualitative, time-intensive, and heavily dependent on the quality of staining. Mid-infrared (MIR) hyperspectral photothermal imaging is a label-free, biochemically quantitative technology that, when combined with machine learning algorithms, can eliminate the need for staining and provide quantitative results comparable to traditional histology. However, this technology is slow. This work presents a novel approach to MIR photothermal imaging that enhances its speed by an order of magnitude. Our method significantly accelerates data collection by capturing a combination of high-resolution and interleaved, lower-resolution infrared band images and applying computational techniques for data interpolation. We effectively minimize data collection requirements by leveraging sparse data acquisition and employing curvelet-based reconstruction algorithms. This method enables the reconstruction of high-quality, high-resolution images from undersampled datasets and achieving a 10X improvement in data acquisition time. We assessed the performance of our sparse imaging methodology using a variety of quantitative metrics, including mean squared error (MSE), structural similarity index (SSIM), and tissue subtype classification accuracies, employing both random forest and convolutional neural network (CNN) models, accompanied by ROC curves. Our statistically robust analysis, based on data from 100 ovarian cancer patient samples and over 65 million data points, demonstrates the method's capability to produce superior image quality and accurately distinguish between different gynecological tissue types with segmentation accuracy exceeding 95%.


Introduction
Mid-infrared spectroscopic imaging (MIRSI) is a class of quantitative, label-free, nondestructive techniques for acquiring spatially resolved chemical information from a sample.
Its utility extends across various fields, such as disease diagnosis, offering an alternative to histopathology [1][2][3][4][5][6][7][8][9] , as well as material science [10][11][12] , environmental and toxicological chemistry 13,14 , and forensics 15,16 .Fourier transform infrared (FT-IR) spectroscopic imaging is the best-known MIRSI technology and has been the de facto standard for spatially resolved molecular fingerprinting of organic molecules [6][7][8]17 . FT-R measurements typically cover 800-4000 cm −1 MIR wavenumbers.However, the acquisition process is notably slow, as not every wavenumber offers distinct chemical information. Aditionally, the resolution of FT-IR is constrained by diffraction limits 18 .For effective analysis, samples must be thin (around 5 µm) and dehydrated due to the substantial challenges posed by water absorption.Previous research has demonstrated that only a certain subset of wavenumbers contain features necessary for deciphering the chemical composition of samples [19][20][21] .The adoption of Quantum Cascade Laser (QCL)-based Discrete Frequency IR (DFIR) imaging mitigates some of the limitations of FT-IR imaging by facilitating data acquisition at fewer wavenumbers, specifically those with chemically significant features [22][23][24][25] .The tunability and wavenumber selectivity offered by QCL sources enable DFIR instruments to acquire data at specific wavenumbers tailored to the application, thereby enhancing the speed of data acquisition.Despite these advancements, both DFIR and FT-IR are subject to a diffraction-limited spatial resolution of 5.5 µm.
The introduction of Optical Photothermal Infrared (O-PTIR) imaging [26][27][28][29][30] overcomes resolution limitations by providing a 0.5 µm spatial resolution and delivers information 100 times more detailed than that provided by FT-IR.O-PTIR imaging overcomes the IR diffraction limit using a pump and probe mechanism.The IR-induced photothermal effect alters the sample's optical properties, leading to changes in visible light intensity, which is proportional to the IR absorption of infrared radiation.Detection is achieved through a coaxial and confocal visible (532 nm) light probe illustrated in Figure1.creasing the spacing for data sampling along the Y-dimension means less data is collected compared to uniformly sampled data at high resolution across both X and Y dimensions.
The third column of the table presents the percentage of data acquired relative to the original high-resolution images.This non-uniform sampling method coupled to reconstruction algorithms 32 can effectively reduce acquisition time by leveraging the spatial and spectral sparsity inherent in MIRSI data 21 .
Validating our approach with multiple methodologies is essential for ensuring the robustness and generalizability of our methods.Therefore, we propose three independent metrics for assessing reconstruction accuracy: mean square error (MSE), structural similarity index measure (SSIM), and classification accuracy.MSE quantifies the average discrepancy between the reconstructed images and the original, ground-truth images.In contrast, SSIM evaluates the visual similarity and the presence of artifacts in the reconstructed images.The application of machine learning algorithms is pivotal in various domains, ranging from electronics 33 to cancer diagnosis 34 .Given that one primary objective of our reconstruction is to enhance the segmentation accuracy of different cell types, we have employed machine learning algorithms and assessed their classification accuracy as an additional metric to ensure optimal reconstruction performance.We obtain data at multiple pixel spacing, measure reconstruction accuracies using the aforementioned metrics, and optimize our algorithms to achieve reliable performance.This reconstruction approach represents a novel and promising method capable of accelerating the acquisition of high-resolution spectroscopic data tenfold, thereby unlocking the full capabilities of the O-PTIR system.

Materials and Methods
An ovarian biopsy tissue microarray (TMA) was obtained from Biomax US (BC11115c) and imaged using a commercial O-PTIR system (Mirage, Photothermal Spec.).The TMA consists of paraffin-embedded cores mounted on a 1 mm thickness CaF 2 substrate.These cores are from separate patients with cases of normal, hyperplastic, dysplastic, and malignant tumors.The patient cohort was composed of women aged 29 to 69; ovarian tumor stages
The adjacent H&E stained TMA was imaged with a Nikon inverted optical microscope with a 10X, 0.4NA objective in the brightfield mode, and has diffraction-limited spatial resolution in the visible range (0.4 µm -0.7 µm).

Sparse Image Reconstruction
We imaged tissue cores using sparse sampling along the y-axis to reduce O-PTIR imaging time, resulting in rectangular hyperspectral images.Using the curvelet transform, we reconstructed images to match the best resolution afforded by O-PTIR.These images were resized, registered, and then enhanced using an unsupervised curvelet transform, as illustrated in Figure 4.The images were acquired with a 0.5 µm spacing along the x-axis and variable spacing along the y-axis, ranging from 0.5 µm to 20 µm.

Interpolation
To reconstruct images, we initially rescale the raw rectangular images along the y-dimension to match a pixel size of 0.5 µm×0.5 µm.This process involves computing the Fourier transform of each low-resolution (0.5 µm×5 µm) band, then centering the lower frequencies in the Fourier domain.We utilize the high-resolution (0.5 µm×0.5 µm) Amide I band (1660

Curvelet Transform
We applied a curvelet transform-based image sharpening algorithm to improve the quality of interpolated square images, which showed increased blurring along the y-axis with greater sampling distances.This method, inspired by our research on multi-modal fusion to enhance the spatial resolution of FTIR images 31 , was adapted to increase O-PTIR imaging speed through sparse sampling along the y-axis.The algorithm effectively enhances images by incorporating spatial information from high spatial resolution band images into lower spatial resolution images, aligning the quality with that of high-resolution images.
Our previous multi-modal image fusion study employed dark-field imaging to capture highresolution spatial information, circumventing the diffraction-limited spatial resolution of FTIR imaging 31 .Given that O-PTIR can achieve a resolution of 0.5 µm, it allows us to avoid the previous challenges associated with integrating data from two distinct technologies, enabling the reconstruction of high-resolution 0.5 µm × 0.5 µm band images solely from sparse O-PTIR data.Furthermore, data from multiple O-PTIR bands are co-registered at acquisition.We initially perform linear equalization between the Amide I band image and each interpolated band image to preserve spectral information and adjust for absorption across different bands.Following equalization, we employ CurveLab 2.1.2to reconstruct the interpolated image using the high-resolution image.We acquire the curvelet transform of the interpolated and Amide I images and combine the low-frequency components from the interpolated image while selecting high-frequency components from the Amide I image, resulting in sharper edges in the reconstructed image.We compute the inverse curvelet transform on the combined data to get the sharpened high-resolution band image.
The schematic is presented in Figure 6.
The sharpened image exhibits superior edge delineation compared to the interpolated image, as demonstrated in Figure 7.The high-resolution image, experimentally collected at a pixel size of 0.5 µm × 0.5 µm, shows edges and intensities akin to those in the reconstructed image.In contrast, the interpolated image appears blurred, with smoother edges, potentially diminishing the accuracy of CNN networks that rely on both spectral and spatial information.

Data annotation
Based on H&E-stained microscopy data, two pathologists independently classified tissue cores as stroma, epithelium, or necrosis.H&E and IR images were manually aligned to generate annotated data for machine learning, and labels were subsequently transferred to O-PTIR images.The tissue microarray (TMA) was divided into two halves, ensuring an equal number of cores in each cohort: the right half was designated for training, while the left half was reserved for testing.

Classification Models and Hyperparameters
The hyperparameters for the random forest classifier and the convolutional neural network (CNN) remain consistent with those reported in our previous work 35 .The primary

Sharpened interpolated
High resolution enhancements in this study involve expanding the input from five to twenty-seven bands and increasing the quantity of training and testing data.Details on the total number of pixels allocated for testing and training are provided in Table 2.

Implementation
All data pre-processing, processing, training and testing were performed in Python using open-source software packages.The CNNs were implemented in Python with the Keras library 36 , and the random forest was implemented using the Scikit-learn library. 37.An GeForce RTX 3090 GPU was used to measure the performance of the CNN classifier on five different sets of randomly selected training pixels.

Results
We calculated the mean square error (MSE) and structural similarity index (SSIM) across various pixel spacings for four cores.SSIM evaluates the spatial feature similarity between reconstructed and original data, aiming for values near 1 for high similarity.MSE measures the average pixel error, with lower values indicating better reconstruction.The means and standard deviations of these metrics are depicted in Figure 8.Both plots indicate that a pixel spacing of 0.5 µm by 5 µm yields favorable results compared to larger pixel spacings.
While smaller pixel spacings lead to improved outcomes, a balance must be struck between data collection efficiency and reconstruction accuracy.Therefore, we recommend a pixel spacing of 0.5 µm by 5 µm as an optimal parameter for data collection using this technique.
Overall accuracy (OA) and receiver operating characteristic (ROC) curves were used to evaluate classifier performance.OA represents the percentage of pixels mapped correctly to the appropriate class for binary and multi-class classification.A ROC curve illustrates the correlation between specificity and sensitivity for identifying acceptable false positives and true positives.
We performed tissue segmentation using the Random Forest (RF) classifier, which leverages spectral information, and Convolutional Neural Networks (CNN), which utilize both structural and spectral information.The overall and class-wise accuracies for the testing dataset are detailed in Table 3.Compared to our previous work 35 , the accuracy of the RF classifier improved by approximately 35%, attributed to the increase in the number of band images from five to twenty-seven.This expansion provides the RF classifier with more spectral information, leading to higher accuracies.As anticipated, CNNs surpass RF in performance, benefiting from their ability to incorporate structural information.To address this challenge, we implemented sparse, interleaved sampling along the Yaxis while maintaining the sampling rate at the diffraction limit in the X-direction.Although various robust sampling methods, such as random and Lissajous sampling, are viable for data reconstruction, the commercial system's design facilitates rapid acquisition along the X-axis at high pixel density, but acquisition is slow along the Y-axis.Given these constraints, we opted for interleaved Y-sampling as the most efficient strategy to collect sparse data.
The size of pixels chosen for sampling relies on striking a balance between the time required to obtain data and the accuracy of data reconstruction.Table 1 outlines the time it takes to collect data for a specific area, with sampling pixel sizes ranging from 0.5 to 10 µm.In these sets of samples, collecting each band image at full resolution would take approximately 37 hours; therefore, for 28 band images for each core, it would take approximately 37 hours at 0.5 × 0.5µm pixel spacing.In our method, by collecting 27 bands at 0.5 × 5µm, and one band at maximum resolution for reconstruction purposes, each core image takes about 5 hours to collect.This shows that data collection alone is shortened almost 7 times, including all overheads.To determine the best pixel spacing, we used three key metrics, namely SSIM, MSE, and classification accuracy, to compare reconstructed data with high-resolution data from the O-PTIR system.These metrics were In our previous study, we classified epithelium and stroma in ovarian tissue using images from five specific wavenumbers with both random forest and CNN algorithms 35 .
However, the 5-bands have insufficient spectroscopic information for identifying classes beyond epithelium and stroma, and the need for a broader range of wavenumbers became apparent.Therefore, we needed a broader range of wavenumbers.Here, we present a generalized approach to practically obtain hyperspectral data that opens new possibilities using O-PTIR.Inspired by results 38 that indicate that a whole (1600 band) hyperspectral data cube is unnecessary for multi-class classification in FT-IR, we selected 27 wavenumbers to achieve efficient multi-class classification.
Our study reveals that a CNN significantly outperformed a random forest classifier, primarily because the latter depends on pixel-wise spectral data, whereas CNNs leverage spatial in addition to spectral features.The addition of 22 new reconstructed band images significantly improved the classification performance of the random forest classifier as demonstrated in Figure 10 and Table 3.The performance of random forest demonstrated the need for more spectral information, but obtaining additional band images would significantly increase the data acquisition time.
Comparing our results to previous research 35 on the binary classification of ovarian cancer, we observed that the accuracy of the random forest classifier increased from 53% with 5 bands to 87% with 27 bands, thanks to the richer spectral information.Similarly, using CNNs led to an accuracy improvement from 90% to 95% when employing 27 bands.
Note that our results from multi-class classification outperforms prior binary classification, which is a testament to the robustness and effectiveness of our approach.The classification outcomes for each algorithm are depicted in Figure 11, with red, green, and blue channels representing epithelium, stroma, and necrosis, respectively.Additionally, Figure 9 shows two cores containing all classes.An adjacent section, stained with H&E and annotated by a pathologist, shows a close alignment with our classification results.

Conclusion
We propose a novel high-speed Mid-Infrared Spectral Imaging (MIRSI) approach that reconstructs hyperspectral images using curvelets, addressing the significant bottleneck of

Figure 1 Figure 2
Figure 1 Schematic illustration of the O-PTIR optical configuration showing both the IR and green (532 nm) laser paths.Pulsed QCL at point (a) causes a photothermal expansion in the sample.A Continuous Wave (CW) green laser, indicated by (b), is collinearly directed onto the sample to serve as a probe beam.A dichroic mirror (c) merges the green and QCL beams, focusing them onto the sample (e) through a reflective Cassegrain objective (d).The resulting modulation in the intensity of the green light (f), scattered back from the sample, facilitates the measurement of its IR absorbance.
I to stage IIIC; histological subtypes include clear cell carcinoma, high-grade serous carcinoma, and Mucinous adenocarcinoma.The deparaffinization was done following the protocol along the lines described in Baker et al. 7 before undergoing O-PTIR imaging.The paraffin-embedded samples were deparaffinized by washing the sample in 100% xylene twice for 5 minutes each and then with 100% ethanol thrice.The corresponding adjacent histological section was stained with H&E and examined by an expert pathologist.Cell subtypes were identified across disease stages.We trained a random forest (RF) classifier, and a CNN model by using the 45 cores on the left half of TMA for training and testing on the remaining 55 cores on the right half of TMA, ensuring that we have an appropriate amount of pixels for each class in training and testing.

Figure 3 FFTFigure 4 Figure 5
Figure 3 Microarray of ovarian cancer cores imaged by O-PTIR at the 1660 cm −1 band.The data encompasses samples from 100 ovarian cancer patients.Variations in tissue biochemistry are highlighted by the color differences, demonstrating the rich biochemical information at the 1660 cm −1 band, chosen for high-resolution reconstruction due to its significance in the fingerprint region.Scale bar: 1.5 mm.

Figure 6
Figure 6 Data fusion through curvelet transform involves taking the curvelet transform of both lowresolution band images and the high-resolution Amide I image.We obtain the high-frequency coefficients from the high-resolution image to achieve sharp edges, while low-frequency coefficients are obtained from the low-resolution band image.By combining these two and taking the inverse curvelet transform, we obtain the reconstructed band image.

Figure 7
Figure 7 Comparison of (a) interpolated, (b) computationally reconstructed, and (c) experimentally obtained high-resolution images.The comparison reveals that data collected at higher speeds with lower resolution can be effectively compensated for by our image sharpening method, which significantly improves upon the interpolated image.

Figure 9
Figure 9 Comparison of stained image (a) identified as ground truth by a pathologist, with classifications by RF (b) and CNN (c) for two ovarian tissue cores.The RF model demonstrates significant improvement over our previously published results, attributed to the increased number of bands.Conversely, the CNN model achieves classification comparable to that of a pathologist's analysis on a stained tissue microArray (TMA), owing to its utilization of both spectral and spatial information.

Figure 10
Figure 10 ROC curves and associated AUC values for each tissue subtype.CNN (blue line) demonstrates superior results compared to RF (dashed orange line) across all tissue subtypes: (a) epithelium, (b) necrosis, and (c) stroma.
technology enhances spectral data resolution from ≈ 5µm to 0.5µm, outperforming current state-of-the-art FTIR systems.This advancement results in a 100× increase in pixel count over the same sample area, offering unprecedented spatial features and chemical information beyond what existing IR spectroscopic techniques can provide.However, this high resolution comes at the cost of slower data acquisition speeds, limited by the signal-tonoise ratio (SNR) and the stage speed of commercial O-PTIR imaging systems.Therefore, optimizing the hyperspectral data collection process for O-PTIR is essential.

Figure 11
Figure 11 Classification results of 100 cores with (a) RF and (b) CNN.Red, Green, and Blue channels correspond to epithelium, stroma, and necrosis respectively.The scale bar is 1.5 mm

Table 1
Sample X-Y versus time for single band imaging (minutes needed for imaging 1500 × 1500 µm region).As the pixel spacing increases, the acquisition time decreases, as shown in the second column.The higher pixel spacing results in some data being missing, as a tradeoff.

Table 2
Number of O-PTIR pixels in training and testing datasets separated by class.To create the training and testing cohorts, the TMA is divided in half.First, a small, random data set is chosen, and a classifier is optimized.To prevent class bias in training, equal numbers of pixels are selected from each class.10, 000 O-PTIR pixels per class are used in the RF classifier and 400, 000 pixels per class for CNNs.

Table 3
Accuracy scores for the classification of Epithelium, Stroma, and Necrosis using (a) Random Forest (RF) and (b) Convolutional Neural Networks (CNNs) were averaged across five repetitions.The superior classification accuracy of CNNs can be attributed to their ability to leverage both spatial and spectral features, thereby outperforming RFs, which rely solely on spectral features.We determined the overall accuracy by calculating the weighted average accuracy of the classes.Note that CNNs outperform RFs across all classes.This superiority of CNNs, attributed to their utilization of spatial features, which RFs lack, underscores the significance of integrating spatial and spectroscopic information to enhance tissue classification